Morphological Analysis of Historical Japanese Text
نویسندگان
چکیده
منابع مشابه
Corpus-based Japanese morphological analysis
The goal of this study is to improve corpus-based Japanese morphological analysis which is composed by word segmentation and part-of-speech (below POS) tagging. We divide the problem of Japanese morphological analysis into three subproblems: models for known word, models for unknown word and corpus maintenance schema. Firstly, we discuss Markov model-based approaches for known word processing. ...
متن کاملImage Analysis for Historical Japanese Book Archives
This paper describes methods of image analysis for historical Japanese book archives with a dominant focus on character segmentation. The segmentation methodology includes stain and smear removal, binarization, character line extraction, and character extraction by region labeling with integration and separation techniques. The experimental results show that the proposed method can segment all ...
متن کاملMorphological Analysis for Japanese Noisy Text based on Character-level and Word-level Normalization
Social media texts are often written in a non-standard style and include many lexical variants such as insertions, phonetic substitutions, abbreviations that mimic spoken language. The normalization of such a variety of non-standard tokens is one promising solution for handling noisy text. A normalization task is very difficult to conduct in Japanese morphological analysis because there are no ...
متن کاملCorpus and Text Analysis of Spontaneous Japanese
There are three major parts of the “Spontaneous Speech: Corpus and Processing Technology” project; (1) compilation of large spontaneous speech corpus, (2) establishment of spoken language engineering based on the corpus, and (3) developing a prototype of a spoken language summarization system. This paper describes how we help to develop this large corpus, i.e., (1), using technology developed a...
متن کاملMorphological Analysis and Diacritical Arabic Text Compression
Morphological analysis of Arabic words allows decreasing the storage requirements of the Arabic dictionaries, more efficient encoding of diacritical Arabic text, faster spelling and efficient Optical character recognition. All these factors allow efficient storage and archival of multilingual digital libraries that include Arabic texts. This paper presents a lossless compression algorithm based...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Natural Language Processing
سال: 2013
ISSN: 1340-7619
DOI: 10.5715/jnlp.20.727